Search results for "graphics processing units"

showing 10 items of 21 documents

GPU accelerated Monte Carlo simulations of lattice spin models

2011

We consider Monte Carlo simulations of classical spin models of statistical mechanics using the massively parallel architecture provided by graphics processing units (GPUs). We discuss simulations of models with discrete and continuous variables, and using an array of algorithms ranging from single-spin flip Metropolis updates over cluster algorithms to multicanonical and Wang-Landau techniques to judge the scope and limitations of GPU accelerated computation in this field. For most simulations discussed, we find significant speed-ups by two to three orders of magnitude as compared to single-threaded CPU implementations.

cluster algorithmsStatistical Mechanics (cond-mat.stat-mech)Computer scienceComputationNumerical analysisspin modelsMonte Carlo methodHigh Energy Physics - Lattice (hep-lat)FOS: Physical sciencesStatistical mechanicsGPU computingPhysics and Astronomy(all)Computational Physics (physics.comp-ph)generalized-ensemble simulationsMonte Carlo simulationsComputational scienceCUDAHigh Energy Physics - LatticeSpin modelGeneral-purpose computing on graphics processing unitsGraphicsPhysics - Computational PhysicsCondensed Matter - Statistical Mechanics
researchProduct

Towards an Efficient Implementation of an Accurate SPH Method

2020

A modified version of the Smoothed Particle Hydrodynamics (SPH) method is considered in order to overcome the loss of accuracy of the standard formulation. The summation of Gaussian kernel functions is employed, using the Improved Fast Gauss Transform (IFGT) to reduce the computational cost, while tuning the desired accuracy in the SPH method. This technique, coupled with an algorithmic design for exploiting the performance of Graphics Processing Units (GPUs), makes the method promising, as shown by numerical experiments.

Computer scienceGauss transformOrder (ring theory)Smoothed Particle Hydrodynamics Improved Fast Gauss Transform Graphics Processing UnitsSmoothed-particle hydrodynamicsSmoothed Particle Hydrodynamicssymbols.namesakeImproved Fast Gauss TransformGaussian functionsymbolsAlgorithm designGraphics Processing UnitsGraphicsAlgorithmComputingMethodologies_COMPUTERGRAPHICS
researchProduct

CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing

2014

We present CUSHAW2-GPU to accelerate the CUSHAW2 algorithm using compute unified device architecture (CUDA)-enabled GPUs. Two critical GPU computing techniques, namely intertask hybrid CPU-GPU parallelism and tile-based Smith-Waterman map backtracking using CUDA, are investigated to facilitate fast alignments. By aligning both simulated and real reads to the human genome, our aligner yields comparable or better performance compared to BWA-SW, Bowtie2, and GEM. Furthermore, CUSHAW2-GPU with a Tesla K20c GPU achieves significant speedups over the multithreaded CUSHAW2, BWA-SW, Bowtie2, and GEM on the 12 cores of a high-end CPU for both single-end and paired-end alignment.

BacktrackingComputer scienceParallel computingSoftware_PROGRAMMINGTECHNIQUESShort readComputational scienceCUDAParallel processing (DSP implementation)Hardware and ArchitectureParallelism (grammar)Electrical and Electronic EngineeringGeneral-purpose computing on graphics processing unitsSoftwareComputingMethodologies_COMPUTERGRAPHICSIEEE Design & Test
researchProduct

On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone arrays

2015

Sound source localization is an important topic in expert systems involving microphone arrays, such as automatic camera steering systems, human-machine interaction, video gaming or audio surveillance. The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known approach for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm analyzes the sound power captured by an acoustic beamformer on a defined spatial grid, estimating the source location as the point that maximizes the output power. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate …

Signal processingReverberationComputer scienceMicrophoneReal-time computingGeneral EngineeringAcoustic source localizationSound powercomputer.software_genreGridExpert systemMicrophone arraysComputer Science ApplicationsSound source localizationNoiseArtificial IntelligenceTEORIA DE LA SEÑAL Y COMUNICACIONESCIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIALGraphics Processing UnitscomputerSteered Response Power
researchProduct

Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model

2010

A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468–4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message P…

FOS: Computer and information sciencesComputer scienceMonte Carlo methodGraphics processing unitFOS: Physical sciencesGeneral Physics and AstronomyMathematical Physics (math-ph)Parallel computingGPU clusterComputational Physics (physics.comp-ph)Graphics (cs.GR)Computational scienceCUDAComputer Science - GraphicsHardware and ArchitectureIsing modelCentral processing unitGeneral-purpose computing on graphics processing unitsMassively parallelPhysics - Computational PhysicsMathematical Physics
researchProduct

GSaaS: A Service to Cloudify and Schedule GPUs

2018

Cloud technology is an attractive infrastructure solution that provides customers with an almost unlimited on-demand computational capacity using a pay-per-use approach, and allows data centers to increase their energy and economic savings by adopting a virtualized resource sharing model. However, resources such as graphics processing units (GPUs), have not been fully adapted to this model. Although, general-purpose computing on graphics processing units (GPGPU) is becoming more and more popular, cloud providers lack of flexibility to manage accelerators, because of the extended use of peripheral component interconnect (PCI) passthrough techniques to attach GPUs to virtual machines (VMs). F…

0301 basic medicineScheduleGeneral Computer ScienceComputer scienceDistributed computingnetworkingCloud computing02 engineering and technologycomputer.software_genre03 medical and health sciencesGPU resource management020204 information systems0202 electrical engineering electronic engineering information engineeringCloud computingGeneral Materials ScienceResource managementplatform virtualizationbusiness.industrycloud computingGeneral EngineeringVirtualizationShared resource030104 developmental biologyVirtual machineScalabilityGPU cloudificationlcsh:Electrical engineering. Electronics. Nuclear engineeringGeneral-purpose computing on graphics processing unitsbusinesscomputerlcsh:TK1-9971IEEE Access
researchProduct

CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations

2013

We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…

SpeedupComputer Networks and CommunicationsComputer scienceSparse matrix-vector multiplicationParallel computingComputer Graphics and Computer-Aided DesignTheoretical Computer ScienceMatrix (mathematics)CUDAArtificial IntelligenceHardware and ArchitectureBenchmark (computing)MultiplicationGeneral-purpose computing on graphics processing unitsSoftwareSparse matrixParallel Computing
researchProduct

GPU-Based Optimisation of 3D Sensor Placement Considering Redundancy, Range and Field of View

2020

This paper presents a novel and efficient solution for the 3D sensor placement problem based on GPU programming and massive parallelisation. Compared to prior art using gradient-search and mixed-integer based approaches, the method presented in this paper returns optimal or good results in a fraction of the time compared to previous approaches. The presented method allows for redundancy, i.e. requiring selected sub-volumes to be covered by at least n sensors. The presented results are for 3D sensors which have a visible volume represented by cones, but the method can easily be extended to work with sensors having other range and field of view shapes, such as 2D cameras and lidars.

0303 health sciences030306 microbiologyComputer scienceVolume (computing)020207 software engineeringField of view02 engineering and technology3d sensor03 medical and health sciencesRange (mathematics)CUDAComputer engineering0202 electrical engineering electronic engineering information engineeringRedundancy (engineering)Fraction (mathematics)General-purpose computing on graphics processing units2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA)
researchProduct

CUDA-BLASTP: Accelerating BLASTP on CUDA-enabled graphics hardware

2011

Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU's capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as wel…

graphics hardwareSource codeComputer sciencemedia_common.quotation_subjectGraphics hardwareGraphics processing unitParallel computingGeneral Purpose Computation on Graphics Processing Unit (GPGPU)Computational scienceInstruction setCUDAGeneticsComputer GraphicsDatabases Proteinmedia_commondynamic programmingFinite-state machineSequence databaseApplied MathematicsProteinsCompute Unified Device Architecture (CUDA)sequence alignmentGeneral-purpose computing on graphics processing unitsAlgorithmsSoftwareBiotechnology
researchProduct

Three-dimensional Fuzzy Kernel Regression framework for registration of medical volume data

2013

Abstract In this work a general framework for non-rigid 3D medical image registration is presented. It relies on two pattern recognition techniques: kernel regression and fuzzy c-means clustering. The paper provides theoretic explanation, details the framework, and illustrates its application to implement three registration algorithms for CT/MR volumes as well as single 2D slices. The first two algorithms are landmark-based approaches, while the third one is an area-based technique. The last approach is based on iterative hierarchical volume subdivision, and maximization of mutual information. Moreover, a high performance Nvidia CUDA based implementation of the algorithm is presented. The f…

Computer sciencebusiness.industryImage registrationMutual informationMachine learningcomputer.software_genreFuzzy logicCUDANon-rigid registration Fuzzy regression Mutual information Interpolation GPU computingArtificial IntelligenceSignal ProcessingPattern recognition (psychology)Kernel regressionComputer Vision and Pattern RecognitionArtificial intelligenceData miningGeneral-purpose computing on graphics processing unitsCluster analysisbusinesscomputerSoftwareInterpolationPattern Recognition
researchProduct